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1. Introduction 

It is well known that testing affects what is taught in the schools. As nationwide tests of 
math skills or reading comprehension become established, they become the standards by which 
school systems, teachers, and students are judged. They unconsciously dictate what students 
should learn, and so education in the schools begins to point toward teaching the skills necessary 
to do well on these tests. The most flagrant example of this effect are the courses to prepare 
students to take the Scholastic Aptitude Test for college entrance, but the effect is far more 
pervasive in subtle ways throughout the schools. 

This phenomenon has potentially beneficial side effects in that the tests form uniform 
standards by which we can compare different schools, teachers, and students. They establish for 
all to see, as it were, what is expected of students, and hence point the nation's students and 
teachers toward specific objectives that can be debated, quantified, and applied equally to all. 

But there are insidious side effects that are less well known. First, there is the tendency for 
testing to drive teaching down to the level of our testing technology - away from learning and 
reasoning skills toward more easily measurable skills that can be tested by multiple-choice items. 
Second, testing encourages students to adopt memorization rather than understanding as their 
goal: knowledge is learned in a form that it can be recalled rather than in a form that it can be 
used in real life tasks. Finally, there is a kind of test-taking mentality that takes over and helps 
turn many students against school and learning more generally. Each of these issues needs some 
amplification. 

Education in service of what can be measured. There is a growing disparity between 
what we think we should be teaching students and what we actually are teaching. This disparity 
is reflected in the concern in the education community about teaching critical thinking and 
metacogttitive skills (e.g., Glaser, 1984). I suspect the disparity arises mainly in our escalation 
of expectations for the schools. Cuban (1984) reports that in historical terms there was more 
emphasis on rote skills and memorization in former years than now. But machines are taking 
over the low-level jobs in our society, leaving more and more demand for the kinds of reasoning 
that only humans are capable of. 
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However, reasoning and metacognitive skills are the most difficult skills to measure. They 
include the skills of planning, monitoring your processing during a task, checking what you have 
done, estimating what a reasonable answer might be, actively considering possible alternative 
courses of action, separating relevant information from irrelevant information, choosing 
problems that are useful to work on, asking good questions, etc. These are skills that current 
tests for the most part do not measure, nor is it easy to see how they could be measured within a 
single-item, multiple-choice format. 

But these are the kinds of skills that have the most payoff in teaching, as evidenced by the 
success of Reciprocal Teaching and other "cognitive apprenticeship" methods (Collins, Brown, 
and Newman, in press; Palincsar and Brown, 1984; Schoenfeld, 1983, 1985). For example, 
Palincsar and Brown (1984) produced huge gains in student's reading comprehension using their 
Reciprocal Teaching method that taught students (1) to formulate questions about texts, (2) to 
summarize texts, (3) to clarify difficulties with texts, and (4) to make predictions about what is 
coming next in texts. These skills are critical to the ability to monitor one's reading 
comprehension (Collins, Brown, and Newman, in press), but they are not the kinds of skills that 
are easily measured. To the degree that testing technology drives education, it will drive 
teaching away from these high-order thinking skills to the lower-order skills that can be 
measured easily. 

Moreover, to the degree we attempt to develop tests that are truly diagnostic, ve may 
exacerbate the problem even more. There have been some great successes in our ability to 
identify systematic student errors in arithmetic and algebra (Brown and Burton, 1978; Brown 
and VanLehn, 1980; Matz, 1982; Sleeman, 1982; Tatsuoka, this volume). One notion afoot is 
that since we can diagnose the precise errors students are making, we can then teach directly to 
counter these errors. Such diagnosis might indeed be useful in a system where diagnosis and 
remediation are tightly coupled, as for example in the LISP tutor described by Anderson (this 
volume). But if diagnosis becomes an objective in nation-wide tests , then this will drive 
education to the lower-order skills for which we can do the kind of fine diagnosis possible for 
arithmetic. Such an outcome would be traly disastrous. It is precisely the kinds of skills for 
which we can do fine diagnosis, that are becoming obsolete in the computational world of today. 
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Conventional testing promotes memorization rather than understanding. Tests are 
one of the great incentives for students to study. But they lead students into the worst kinds of 
study strategies. When students learn information for tests, they are developing strategies and 
memorizing information in forms that are of little or no use for real world problem solving. 

For example, much of student's studying involves memorizing information or procedures 
that they think they will be asked on a test (Schoenfeld, in press). This leads to the problem of 
"inert" knowledge (Collins, Brown, and Newman, in press). Facts and procedures are learned in 
isolation, apart from the different contexts in which they might be used. As we argued in the 
earlier paper, learning of information and procedures needs to be "situated" in multiple contexts 
reflecting its different uses in real world contexts. Otherwise students are not likely to see how 
the knowledge they are getting can be applied. The "what" of knowledge is only a third of what 
needs to be learned; we also need to know the "when" and "how" it applies in different contexts, 
else we will find that students cannot transfer what they have learned to new, but relevant, 
contexts. 

A related problem was identified by Schoenfeld (1985) for tests covering course material 
among math students. They develop strategies fd what to do, based on idiosyncrasies of the 
course and test problems. For example, if an answer doesn't come out to an even integer, they 
think it is wrong. And the methods they consider using are governed by what material the test 
covers (e.g. addition of fractions, algebra work problems), rather than among all the methods 
they have learned up to then in their mathematics courses. Thus they evolve solution methods 
that are cou-iterproductive for solving real world problems. 

In summary, when testing becomes the raison d'etre for learning, students develop 
memorization strategies that lead to decontextualized knowledge that cannot be applied later in 
relevant contexts. Furthermore, they learn problem solving strategies that are counterproductive 
for real world problems. 

Conventional testing fosters a mentality that turns some students against learning. 
Even more subtle and insidious than the previous two side effects is what happens when poorer 
students see rewards and success going to the students who do well on tes's, and disapproval to 
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themselves (c.f. Dweck, 1986). They come to regard learning as synonymous with doing well on 
tests, and since they do not do well on tests, they do not want to compete in what they perceive 
will be a losing battle. In consequence, they come to regard education as iirelevant to their 
interests in life (e.g., becoming an athlete or beautician), and boring as compared with say their 
social life, athletics, etc. This is not to say that removing testing from the schools would 
completely alleviate the problem of students' loss of motivation to learn anything in school, 
since teachers' expectations undoubtedly contribute to their negative self image as learners. 
(Rosenthal and Jacobson, 1968). But tests are a major contributing factor, since they are the 
means by which students are publicly labeled as inferior. 
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2. Desiderata for a New Kind of Testing 



There are five desiderata that I view as critical for a new more benign learning and testing 
environment. They may r.ot all be attainable, but they serve as goals to strive toward in 
redesigning testing: 

1. Tests should e m phasize learning and thinking . A test in any domain should 
emphasize higher-order thinking skills in that domain: in particular, problem 
solving strategies (i.e. heuristics), self-regulatory or monitoring strategies, and 
learning strategies (Collins, Brown & Newman, in press). Dynamic testing 
(Campione & Brown, this volume) goes some way toward centering testing on just 
such issues. These higher-order skills are what we want students to learn, and so 
tests must focus on them. 

2. TVctc should require generation as well as selection . Most tasks in the real world 
require planning and executing, but multiple choice tests only require choosing the 
best answer. Hence they cannot in fact measure critical aspects of thinking. So it 
is important that tests require generation of ideas by students (Frederiksen, 1984). 

3. Tests should be integral to learning . As presently construed, students stop learning 
when they take a test. Occasionally they may learn something going over a test, 
but this iiappens only rarely. The major positive effects of tests on learning then 
are the motivational effects, and these occur mainly with teacher-generated tests. 
Ideally tests should not be intrusive to learning, but rather integral to it. This is 
perhaps the most difficult of the five desiderata to achieve. 

4. Tests should serve, multiple purposes . I have alluded to some of the purposes 
served by tests, and other researchers (cf. Linn, 1986) have tried to enumerate such 
purposes. Let me list some of those purposes lest they be overlooked: a) 
motivating students to study and directing that study to certain topics or issues, b) 
diagnosing what difficulties students are having and selecting what they should 
study next, c) placing students in classes, grades, schools, and jobs, d) reporting to 
students, teachers, and parents on the progress a student has made, and e) 
evaluating how well a teacher, school, or school system is doing vis a vis other 
teachers, schools, and systems. There may be other purposes for testing but these 
are the major purposes. In some sense testing must serve all of these purposes in 
one way or another. 



ERIC 



BBN Laboratories Incorporated 



5. Tests should be valid with respect to all their purposes . Test makers have worried 
a great deal about reliability and validity of tests. But for the most pan their 
concerns about validity have only been for content validity and predictive validity 
of the tests with respect to future schooling. We need to be much more concerned 
about the validity of tests with respect to the other purposes of testing. For 
example, do the tests really measure the effectiveness of teaching? Do they 
motivate students to learn the kinds of knowledge and higher-order skills we want 
children to learn? As I said in the Introduction, there are reasons to doubt that they 
do. 

Furthermore, as pointed out earlier, when tests become more critical in making 
decisions, teachers and students direct their teaching and learning to do well on the 
test. Then tests lose validity. That is, to the degree test validity depends on factors 
that coincide with, but are not the same as, the skills required in the future school 
or job, preparing for the test reduces the predictive validity of the test. So that 
what stans out as a highly valid test may lose validity as it becomes more visible or 
decisive in making selections. 

Suppose for example that aptitude for college is measured with a vocabulary test. 
Normally a vocabulary test might be a very good predictor of how well someone 
will do in college, because people who read and study acquire a large vocabulary in 
the process. But then suppose it becomes known that students will be selected for 
college or that teacher* will be evaluated for effectiveness on the basis of such a 
test. Then it behooves the student or teachers to concentrate on vocabulary, which 
is a relatively easy thing to learn as compared to say an understanding of algebra or 
literature. When that happens, the vocabulary test ceases to be a good predictor of 
how someone will do in college. In fact, better student: are likely to regard 
learning vocabulary as cheating while lesser students will regard it as necessary for 
survival, so the test might even become negatively predictive if enough attention is 
focused on it. Furthermore, students will concentrate their energies not on learning 
what is most valuable for future life, but rather on what is at best a superficial 
index of learning (i.e. vocabulary). This is what Frederiksen (1984) refers to as the 
"real test bias". 

In this example, it is possible to substitute any number of things for vocabulary. 
For example, Ravens matrices or analogy problems are probably quite good 
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measures of general problem solving ability, unless students practice on such 
items. If they do, tney can learn (or be taught) the patterns by which such items are 
constructed and so they do not then have to figure out nearly as much when they 
come to take the test. So again such tests will lose their predictive validity if they 
become the focus of study. The only way to prevent such an occurrence is to have 
tests that reflect all the knowledge, skills, and strategies necessary for success in 
college or whatever outcome the tests are designed to predict. 
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3. Two Scenarios for a New Testing and Learning Environment 

Testing is undoubtedly necessary in a complex society where we need to make decisions 
about who should go to what schools, who should do what jobs, and what should be taugl t to 
different students. The fundamental question about testing then, as I see it is: how can we 
consuuet an educational system that embodies testing in a form that sustains its necessary 
functions and at the same time alleviates the problems that the current system has generated. 

The three papers I have been asked to comment upon by Anderson, Frederiksen and Wt* i(e, 
and Kieras point the way to a possible answer. In *he rest of the paper I want to elaborate at 
some length how it is possible to take the ideas implicit in intelligent tutoring systems, and 
educational computer systems more generally, to construct a new kind of learning and testinp 
environment. 

There are two scenarios I can envision for exploiting the potential of computer systems for 
testing. The first, more conservative scenario is partly depicted in the papers by Frederiksen and 
White and by Kieras. It is summarized nicely in Frederiksen and White's title "Intelligent Tutors 
as Intelligent Testers" and it goes some way toward addressing the desiderata outlined in the 
previous section. The second, more radical scenario env^ions a completely integrated learning 
and testing environment. 

Intelligent tutors as intelligent testers . In this first scenario intelligent tutoring systems 
become the devices for administering tests to students. The tests would be problem solving tests, 
where problems differing in difficulty are given to students. The test would start with easier 
problems, and, depending on how well the student does, the subsequent problems will be easier 
or more difficult, as with adaptive testing. 

As they solve problems, students can be given cognitive feedback on how best to solve 
these kinds of problems. That is, the full capability of the tutoring system to teach the students 
can be employed as part of the testing procedure. The test then will measure not simply their 
prior ability to perform the kind of tasks given by the system, but in addition it will measure how 
well they can learn to perform these tasks given precisely specified cognitive fetdback and 
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advice. This gives the intelligent testing system the same kind of capabilities for measuremeru 
that Campione and Brown (this volume) have developed in their work on dynamic assessment. 

Intelligent tutoring systems require the student to generate entire sequences of actions that 
lead to solutions, whether the problems are programming problems as in Anderson's LISP Tutor, 
or electricity problems as in Frederiksen and White's Quest tutor, or operational problems as in 
Kieras's phaser control system. While the responses allowed by tutoring systems are not open 
ended (i.e., there is usually a restricted class of inputs that the system can process), they are not 
single-iiem, multiple-choice response formats. Thus, the responses required by intelligent 
tutoring systems are generative in the sense implied in the desiderata listed earlier, but at the 
same time they are precise enough to be evaluated according tc well-defined criteria necessary 
for consn acting tests. 

Scoring in such a system, can be based on the same kinds of measures now used to evaluate 
proble solving: percent correct in solving problems, average time to solve problems, number of 
incorrect vs. correct steps taken in attempting to solve a problem, etc. But to the degree a 
system has a characterization of what expert performance requires, as in Anderson's LISP Tutor 
or Frederiksen and White's Quest, it is possible to evaluate students more directly. In the case of 
the LISP Tutor, the system has an idealized problem solving model consisting of some 325 
production rules, which represent its strategies for solving programming problems. As students 
work problems the system can evaluate the degree to which each of these productions is used 
where appropriate: then we have a measure of how much of the expert model has been acquired. 
Similarly, it might also be possible to assess how well the students have learned to suppress 
those productions in the system representing particular misconceptions. For the Quest System, 
the student's level of performance can be evaluated in terms of how far along the progression of 
more and more sophisticated models a student has advanced. In either case it should be possible 
to measure both the student's current level of understanding of the domain, and the rate at which 
he or she is learning with the tutoring system. 

Because the systems can analyze sequences of actions, they have a capability to measure 
strategic skills as well as domain skills. For example, Anderson, Boyle, and Reiser's (1985) 
Geometry Tutor allows students to work forward from the givens or backwards from the 
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statement to be proved in constructing geometry proofs. One good strategy to learn is first to 
work forward from the givens a little way to see their implications and then to work backwards 
from the statement to be proved in order to close the gap. A good "metacognitive strategy" is 
that when you are stuck working forward or backwards (which might be indicated by a long 
pause), switch to working the other way. In a system such as the Geometry Tutor, it would be 
possible for the system to analyze sequences of actions (and pauses) by the students, to make 
suggestions as to what are good strategies, and to evaluate how well students learn to approach 
problems strategically (Collins & Brown, in press). 

A serious limitation of today's intelligent tutoring systems is that they only exist in the 
domains of math and science. This is because computational techniques provide the most 
leverage in these domains. One question then is whether computer systems have any role to play 
as testing systems in domains such as reading, writing, and history. There are in fact less-than- 
intelligent computer-based teaching systems in these three domains that might be useful. 

For example, in the domain of reading, the IRIS system (developed by WICAT-described 
in Collins, 1986) presents passages to students and then asks questions about the passages, much 
like a reading comprehension test. But it is an instructional system, so that students receive 
cognitive feedback on what they do that should help them learn to read better. My reservation 
about this particular system is that there is not as much instruction on how to make inferences or 
monitor one's comprehension as there is in the best comprehension instruction (e.g., Palincsar & 
Brown, 1984). But the structure is potentially there to do so. 

The most relevant computer system for testing writing is Writer's Workbench (McDonald, 
Frase, Gingrich, & Keenan, 1982), but it is more an advisory system than a teaching system. It 
can analyze texts in terms of spelling, word usage, and grammar; it can even evaluate overusage 
of the passive voice, frequency of empty phrases like "there re", and the readability of the text 
by standard readability measures. But essentially it is only evaluating surface features of the 
text: it cannot evaluate clarity, interest, persuasiveness, or memorability which are the critical 
aspects that a good text must possess (Collins & Gentner, 1980). Designing testing around a 
system that only evaluated surface features would lead the teaching of writing in the wrong 
direction. But that is all that computer-based systems will be capable of evaluating in the 
foreseeable future. 
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The most ingenious computer-based teaching system for history is Geography Search 
developed by Tom Snyder (Kelman et al. 1981). It is a historical simulation of the time after 
Columbus discovered America, and explorers sailed to the New World to bring back its wealth 
and resources. In the simulation students have to purchase supplies for their trip to the New 
World, and navigate using sextant and compass. They must plan their voyage as they go, 
depending upon what they find and how many supplies they have left for the voyage home. 
Historical simulations such as Geography Search or the Civil War Game by Avalon Hill, give 
students an understanding of the reasons why events take place in a historical context. While 
Geography Search does not do so, it would certainly be possible to provide a computer coach to 
advise students as they engage in these simulations. In such a scenario, it would be possible to 
evaluate how well students learn to plan and solve problems in historical contexts. This is not 
what we usually test about students' understanding of history, but it is perhaps an equally valid 
kind of historical understanding. Moreover, most of the important concerns of history, such as 
the development of the Constitution or the settling of the American frontier, can be turned into 
historical simulations. 

In summary, the plan to develop intelligent tutors as intelligent testers is feasible in much of 
the current school curriculum. It has several benefits: (a) resting would be focused on students 
problem solving and planning skills, (b) their ability to learn in a domain as well as their prior 
knowledge could be tested, and (c) the tests could be adaptive to the student's prior knowledge, 
and would test their generative abilities instead of their recognition abilities. But such a 
scenario, while feasible, would require a large amount of effort to produce intelligent testing 
systems that cover a large part of the curriculum. Computer-based teaching systems will in fact 
be developed to cover much of the school curriculum in the next decade, given the expansion of 
tools and resources that is taking place in the field. Whether these will be extended to address 
testing concerns is still an open question, however. 

An integrated learning and testing environment . The second, more radical scenario for an 
integrated learning and testing environment is implicit in the way Anderson (this volume) has 
analyzed students' learning in the LISP tutor. The tutor was built to teach students LISP, but as 
a side effect of the teaching, Anderson collected a record of their performance with the tutor that 
he could analyze to test various hypotheses. Each analysis is a slice through the data to answer 
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certain questions. He can look at students' learning curves, error rates, response times, and even 
factor out differences between their ability to learn versus their ability to remember. That is, the 
computational medium enables evaluation to be carried out on the process of learning. Rather 
than stopping to take a test, the testing comes free in the course of the teaching. 

This view of testing first evolved to my knowledge in a cognitive science working group at 
a conference on testing (Tyler and White, 1980). The analogy we used was to professional 
sports like baseball or football where extensive records are kept on players (by scorekeeping and 
videotaping), so that their performance can be evaluated from different perspectives: e.g., in 
baseball, the batting percentage with men on base, the number of runners left stranded by a 
player, the batting percentage against left handed vs. right handed pitchers, etc. Different 
statistics are used to make different decisions: should you keep the player or send him to the 
minors, where should he be in the batting order, in what situations should he be used as a pinch 
hitter, what should he practice, etc. In sports all the varieiy of questions we try to answer on the 
basis of tests in school are answered on the basis of analysis of actual performance in the field. 

In this scenario, students work with computers either in groups or individually. The 
teacher's role is that of coach rather than instructor. She will suggest tasks and activities for 
students to engage in, give them advice or help when they need it, and monitor how they are 
progressing. This scenario assumes that there is a variety of good educational software, as well 
as computational tools (e.g. word processors and writing coaches, statistical and graphing 
programs, and computer-based laboratories). The students would spend their day working with 
different programs, for example using the LISP Tutor or Quest, doing science projects with 
statistical and graphing programs, debating with students in other schools via electronic 
networks, etc. The computers and teachers would be assistants to the students' self learning. 

My claim is that in this environment all the functions of testing can be realized without 
students taking tests per se, and that all the desiderata for testing can be achieved without the bad 
side effects described. There are three kinds of measures that occur in this scenario that can be 
used to cany out the multiple functions of testing: 

1. Diagnosis . Diagnosis is distributed between computer, teacher, and students. 
Many computer tutors, such as the LISP Tutor (Anderson, this volume), carry out 
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some form of diagnosis. In the case of the LISP Tutor the diagnosis is extremely 
local; it only looks for specific errors students may make at each step and gives 
advice accordingly. Other computer tutors, such as Sophie (Brown, Burton, & 
deKleer; 1982) and WEST (Burton & Brown, 1982) perform much more global 
analyses of the students misunderstandings and errors. Frederiksen and White (this 
volume) suggest providing aids so that students can do self diagnosis, which should 
prove even more effective than computational analysis alone. Finally, the teacher 
will be available to interact with students on a one-to-one basis as a coach, and 
hence should be able to build up a better picture of the difficulties particular 
students are haying than in the traditional classroom. 

2. Summary Statistics . As Anderson (this volume) has done with the LISP Tutor, it is 
possible to keep records of what students do while they are learning and analyze 
these records to report to different audiences on students' progress. For example, a 
report to administrators might summarize how many students went all the way 
through the LISP Tutor and the Geometry Tutor, and how fast they went through 
each. A report to parents might describe what tutoring systems their child worked 
with, what kind of progress they made, how hard they tried (in terms of how long 
they stuck with various programs, particularly when they were having difficulty) 
and any other measures that parents request, individually or collectively. Reports 
to teachers might summarize the kinds of difficulties each student is having, and 
the amount each student learned using the different programs (in terms of the 
difference between their scores in the beginning and their scores in the final 
sessions). In fact, teachers could be given the capability of requesting different 
kinds of analyses be made on the data, just as Anderson did with the LISP Tutor. 
There are other audiences and others way of analyzing such data, but these 
examples suffice to show what might be done. 

3. Portfolios . Some computer-based teaching systems keep a library of students' best 
work. As far as I know, the idea was first used in the Plato math curriculum 
(Dugdale and Kibbey, 1975). An excellent example of a library is the one in Green 
Globs, a game to teach analytic geometry developed by Sharon Dugdale. The 
game, which is a part of a larger set of computer-based activities to teach analytic 
geometry, requires students to write equations for curves to go through fifteen 
green globs placed randomly on a Cartesian plane. The more green globs any 



9 

ERIC 



13 

17 



BBN Laboratories Incorporated 



curve goes through (each glob only counts once), the more points students score 
for that curve (the mh glob scores 2 n ' J points). In the library are stored the highest 
s> oring games played, showing where the globs were placed and the equations 
written to make the high score. The name of the player who scored each game is 
also listed, thus garnering fame for a good performance. 

The concept of the library can be extended to the personal portfolio that students 
keep as a record of their accomplishments. The portfolio could record the students 
best compositions, game performances, or problem solutions. Art schools and 
architecture firms require portfolios to help them determine who should be 
admitted. This is because they know it is impossible to evaluate the creative skills 
of a person in terms of standard tests. As we move to a society where learning and 
thinking are critical, the same problems arise with standard tests in other domains. 
So by basing placement decisions at least in pan upon student portfolios (they may 
be based in pan on summary statistics described above), the decision will take intc 
account creativity as well as selectivity. 

Moreover, basing decisions upon accomplishments rather than simply on measured 
aptitudes reflects more realistically the way decisions are made in the real world. 
We value employees or students who do good things, not those who merely have 
the capability to do good things. By stressing accomplishment in our decisions, we 
change the motivation structure for students in school. The emphasis will change 
from a concern with doing well on tests to producing good works. 

In summary, let me review briefly how this scenario addresses the desiderata and concerns 
raised in the first two sections. The scenario entails moving away from testing per se to analysis 
of the ongoing learning and accumulation of the products produced by that learning. There is no 
lowering of standards to what can be measured, nor any overemphasis on doing well on tests. 
Moreover, there will be little stigmatizing of students for their poor performance; rather they will 
be rewarded for good products. The emphasis on learning and thinking will be central. 
Furthermore, the three kinds of measures discussed can validly serve all the purposes of testing 
in today's school. The scenario describes a truly integrated learning and testing environment. 
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4. Conclusion 



The introduction of computers into our education system provides an opportunity to rethink 
the whole relationship between testing and learning. There are serious problems with the way 
testing currently drives our education system: it fosters emphasis on lower-order rather than 
higher-order skills and encourages stigmatization of students who do not do well. Further testing 
as presently construed has only worried about content and predictive validity rather than about 
validity with respect to the many other purposes of testing. But by repositioning testing in 
computer-based learning environments, many of the problems with testing as currently construed 
can be alleviated. 
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